Objective: to learn how to manipulate gridded datasets and vectorised data through the application of the most important functions in R.
Vector data stores the location, shape, and attributes of geographic features in one of the following forms:
There are different packages that can be used to load and manipulate vector data (shapefles) in R. In this lecture, we will use the sf and terra packages. The tibble package can also be installed as it is used by sf.
To install these packages we can use the install.packages function.
Once they are installed, we can load them with the library function.
To read vector data in R, we can use both packages. To use the sf package we can apply the read_sf
If the folder contains more than one shapefile, you have to specify the file path:
If we type the name of the object in the console:
countries
## Simple feature collection with 15 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 12.19545 ymin: -13.45568 xmax: 47.98618 ymax: 31.66734
## Geodetic CRS: WGS 84
## # A tibble: 15 × 10
## STATUS DISP_AREA ADM0_CODE ADM0_NAME STR0_YEAR EXP0_YEAR Shape_Leng
## <chr> <chr> <int> <chr> <int> <int> <dbl>
## 1 Member State NO 205 Rwanda 1000 3000 8.13
## 2 Member State NO 133 Kenya 1000 3000 48.5
## 3 Member State NO 74 South Su… 2011 3000 46.9
## 4 Member State NO 257 United R… 1000 3000 83.0
## 5 Member State NO 253 Uganda 1000 3000 23.2
## 6 Sovereignty uns… YES 61013 Ilemi tr… 1000 3000 2.84
## 7 Member State NO 79 Ethiopia 1000 3000 50.4
## 8 Member State NO 77 Eritrea 1000 3000 50.0
## 9 Member State NO 43 Burundi 1000 3000 9.12
## 10 Member State NO 68 Democrat… 1000 3000 99.0
## 11 Sovereignty uns… YES 40760 Hala'ib … 1000 3000 8.51
## 12 Sovereignty uns… YES 40762 Ma'tan a… 1000 3000 2.06
## 13 Member State NO 6 Sudan 2011 3000 81.9
## 14 Member State NO 40765 Egypt 1000 3000 61.3
## 15 Sovereignty uns… YES 102 Abyei 2011 3000 3.61
## # … with 3 more variables: Shape_Area <dbl>, Name_label <chr>,
## # geometry <MULTIPOLYGON [°]>Let’s plot the countries object:
The function names will return the names of all the attributes stored.
sf objects are data frames with a special “geometry” column:
st_geometry(countries)
## Geometry set for 15 features
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 12.19545 ymin: -13.45568 xmax: 47.98618 ymax: 31.66734
## Geodetic CRS: WGS 84
## First 5 geometries:
## MULTIPOLYGON (((30.47352 -1.057411, 30.47035 -1...
## MULTIPOLYGON (((39.37656 -4.721247, 39.37412 -4...
## MULTIPOLYGON (((33.02315 12.22673, 33.25586 12....
## MULTIPOLYGON (((40.21717 -10.29314, 40.21717 -1...
## MULTIPOLYGON (((34.09062 3.878785, 34.09114 3.8...Since geometries are not single-valued, they are put in a list-column, with each list element holding the simple feature geometry of that feature. The three classes used to represent simple features are:
If we want to plot the geometry only:
We can also define the limits for the x-axis and y-axis of the plot.
countries_geom <- st_geometry(countries)
plot(st_geometry(countries), axes = TRUE,
xlim = c(20, 40), ylim = c(1, 20))If we want to plot a single attribute:
Many more possibilities: https://r-spatial.github.io/sf/articles/sf5.html
Every row contains a single feature:
To get a simple feature containing only Ethopia:
ethiopia <- countries[7, ]
print(ethiopia)
## Simple feature collection with 1 feature and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 32.99994 ymin: 3.401109 xmax: 47.98618 ymax: 14.89416
## Geodetic CRS: WGS 84
## # A tibble: 1 × 10
## STATUS DISP_AREA ADM0_CODE ADM0_NAME STR0_YEAR EXP0_YEAR Shape_Leng Shape_Area
## <chr> <chr> <int> <chr> <int> <int> <dbl> <dbl>
## 1 Membe… NO 79 Ethiopia 1000 3000 50.4 92.9
## # … with 2 more variables: Name_label <chr>, geometry <MULTIPOLYGON [°]>When we want to manipulate raster and vector data altogether, the terra package is very useful. To read vector data with this package we will use the vect function.
To save the extracted feature(s) in a specific address on your system, we can use the write_sf function.
You can save into other formats with st_write. Look at the driver argument in the help.
We can get rid of the geometry with the st_drop_geometry function. Calculating new attributes (e.g., mm/m2 to total per country):
Alternatively, we can use dplyr:
countries %>%
mutate(precip_ml = runif(nrow(countries)),
precip_total = precip_ml * Shape_Area)
## Simple feature collection with 15 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 12.19545 ymin: -13.45568 xmax: 47.98618 ymax: 31.66734
## Geodetic CRS: WGS 84
## # A tibble: 15 × 13
## STATUS DISP_AREA ADM0_CODE ADM0_NAME STR0_YEAR EXP0_YEAR Shape_Leng
## * <chr> <chr> <int> <chr> <int> <int> <dbl>
## 1 Member State NO 205 Rwanda 1000 3000 8.13
## 2 Member State NO 133 Kenya 1000 3000 48.5
## 3 Member State NO 74 South Su… 2011 3000 46.9
## 4 Member State NO 257 United R… 1000 3000 83.0
## 5 Member State NO 253 Uganda 1000 3000 23.2
## 6 Sovereignty uns… YES 61013 Ilemi tr… 1000 3000 2.84
## 7 Member State NO 79 Ethiopia 1000 3000 50.4
## 8 Member State NO 77 Eritrea 1000 3000 50.0
## 9 Member State NO 43 Burundi 1000 3000 9.12
## 10 Member State NO 68 Democrat… 1000 3000 99.0
## 11 Sovereignty uns… YES 40760 Hala'ib … 1000 3000 8.51
## 12 Sovereignty uns… YES 40762 Ma'tan a… 1000 3000 2.06
## 13 Member State NO 6 Sudan 2011 3000 81.9
## 14 Member State NO 40765 Egypt 1000 3000 61.3
## 15 Sovereignty uns… YES 102 Abyei 2011 3000 3.61
## # … with 6 more variables: Shape_Area <dbl>, Name_label <chr>,
## # geometry <MULTIPOLYGON [°]>, precip_mm <dbl>, precip_total <dbl>,
## # precip_ml <dbl>A raster consists of a matrix of cells (or pixels), where each cell contains a value or set of values representing certain information.
Rasters with discrete data often use a code to represent different data types.
To open a GeoTIFF raster file (or similar), the rast command from the terra package is used.
gleam.path <- "C:/User/L3_Rasters_and_Shapefiles_Data/GLEAM_annual/GLEAM_2009.tif"
gleam <- rast(gleam.path)To be able to perform raster operations, all the raster files involved will need to have the same raster geometry:
The metadata of the raster file can be visualised by typing the name of the object where it was stored (in this case gleam).
print(gleam)
## class : SpatRaster
## dimensions : 720, 1440, 1 (nrow, ncol, nlyr)
## resolution : 0.25, 0.25 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84
## source : GLEAM_2009.tif
## name : GLEAM_2009
## min value : -23.93291
## max value : 2148.198In R, information about the raster (coordinate system, spatial extent, number of grid-cells, etc.) can be easily retrieved or introduced using the crs function from the raster package:
crs(gleam)
## [1] "GEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]],\n ID[\"EPSG\",6326]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8901]],\n CS[ellipsoidal,2],\n AXIS[\"longitude\",east,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433,\n ID[\"EPSG\",9122]]],\n AXIS[\"latitude\",north,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433,\n ID[\"EPSG\",9122]]]]"
crs(gleam) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84
+towgs84=0,0,0"To plot a raster we use the plot function.
Alternatively, we can use the function click to get the value of a certain pixel.
To exit the click function please press Esc.
The package terra is able to work efficiently with a large number of raster files; therefore, Creating a multi-layer objects is extremely easy.
The use of multi-layered gridded objects (stacks) is very useful in multiple cases (e.g., when working with rasters concerning environmental data such as precipitation, evapotranspiration and temperature).
We want to use these annual ETa files to:
Step 1: List all the files within the folder.
ssebop.path <- "C:/Users/Example/SSEBop_annual"
ssebop.files <- list.files(ssebop.path, full.names = TRUE, pattern = ".tif")head(ssebop.files)
## [1] "../Data/L3_Rasters_and_Shapefiles_Data/SSEBop_annual/SSEBop_2003.tif"
## [2] "../Data/L3_Rasters_and_Shapefiles_Data/SSEBop_annual/SSEBop_2004.tif"
## [3] "../Data/L3_Rasters_and_Shapefiles_Data/SSEBop_annual/SSEBop_2005.tif"
## [4] "../Data/L3_Rasters_and_Shapefiles_Data/SSEBop_annual/SSEBop_2006.tif"
## [5] "../Data/L3_Rasters_and_Shapefiles_Data/SSEBop_annual/SSEBop_2007.tif"
## [6] "../Data/L3_Rasters_and_Shapefiles_Data/SSEBop_annual/SSEBop_2008.tif"Step 2: Create a stack from all the single raster layers.
Step 3: Calculate the multi-annual mean.
We currently have the raster stack saved as the object ssebop.
print(ssebop)
## class : SpatRaster
## dimensions : 8082, 7772, 16 (nrow, ncol, nlyr)
## resolution : 0.009652, 0.009652 (x, y)
## extent : -20.00845, 55.00689, -38.00535, 40.00211 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## sources : SSEBop_2003.tif
## SSEBop_2004.tif
## SSEBop_2005.tif
## ... and 13 more source(s)
## names : SSEBop_2003, SSEBop_2004, SSEBop_2005, SSEBop_2006, SSEBop_2007, SSEBop_2008, ...
## min values : 0, 0, 0, 0, 0, 0, ...
## max values : 3272, 3353, 3299, 3296, 3462, 3333, ...Similar to a list, we use double square brackets to access a particular raster (or multiple rasters) from a raster stack.
print(ssebop[[2]])
## class : SpatRaster
## dimensions : 8082, 7772, 1 (nrow, ncol, nlyr)
## resolution : 0.009652, 0.009652 (x, y)
## extent : -20.00845, 55.00689, -38.00535, 40.00211 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : SSEBop_2004.tif
## name : SSEBop_2004
## min value : 0
## max value : 3353
print(ssebop[[1:3]])
## class : SpatRaster
## dimensions : 8082, 7772, 3 (nrow, ncol, nlyr)
## resolution : 0.009652, 0.009652 (x, y)
## extent : -20.00845, 55.00689, -38.00535, 40.00211 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## sources : SSEBop_2003.tif
## SSEBop_2004.tif
## SSEBop_2005.tif
## names : SSEBop_2003, SSEBop_2004, SSEBop_2005
## min values : 0, 0, 0
## max values : 3272, 3353, 3299The crop function extracts a portion of a gridded dataset according to a selected input (can be a raster or a vector file; i.e., SpatRaster and SpatVector, respectively).
To crop a raster:
The parameter snap is used to align the cropped file to the extent of the selected element that is used to perform the crop.
To mask a raster:
To list all the unique values in a raster object, the getValues function is used. - The output is in the form of a vector
gleam_vals <- values(gleam.mask)
gleam_vals <- as.numeric(gleam_vals)
print(gleam_vals)
## [1] NA NA NA NA NA 823.3318 742.6441
## [8] 824.1913 NA NA NA 1063.5068 1031.8060 977.4069
## [15] 837.3028 793.3273 835.6875 NA NA 1141.6630 1031.7339
## [22] 938.8115 869.1260 827.8035 774.6710 820.8055 905.7816 NA
## [29] 1192.4106 1073.8735 914.8868 840.9797 838.4015 800.3232 828.6646
## [36] 908.3577 NA 1179.2241 1033.6238 878.3455 838.2838 795.3875
## [43] 769.3068 844.0502 860.8229 1094.2697 1071.3359 979.1405 876.9600
## [50] 837.6367 830.4340 897.6795 830.6163 840.5397 966.6479 974.6377
## [57] 1011.7329 880.3939 856.8268 NA NA NA NA
## [64] NA NA 957.3956 891.5947 873.2820 NA NA
## [71] NA NAIn order to perform raster operations, the raster geometry between the layers must be the same.
The previous slide showed resampling to a different origin using the same grid-cell size.
How to resample rasters in R. - Nearest neighbour method (rasters with categorical or continous variables)
print(land.cover)
## class : SpatRaster
## dimensions : 6002, 4802, 1 (nrow, ncol, nlyr)
## resolution : 0.008329863, 0.008330556 (x, y)
## extent : 10, 50, -15, 35 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : MCD12Q1_LC1_2009_001.tif
## name : MCD12Q1_LC1_2009_001
## min value : 1
## max value : 17To resample the raster to the spatial resolution of the gleam object we will use the resample function.
print(landcover.res)
## class : SpatRaster
## dimensions : 720, 1440, 1 (nrow, ncol, nlyr)
## resolution : 0.25, 0.25 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs
## source : memory
## name : MCD12Q1_LC1_2009_001
## min value : 2
## max value : 17How to resample rasters in R. Bilinear method (rasters with continuous variables).
print(chirps)
## class : SpatRaster
## dimensions : 2000, 7200, 1 (nrow, ncol, nlyr)
## resolution : 0.05, 0.05 (x, y)
## extent : -180, 180, -50, 50 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : chirps_monthly2009-01.tif
## name : chirps_monthly2009-01
## min value : 0
## max value : 1301.122
chirps.res <- resample(chirps, gleam, method = "bilinear")print(chirps.res)
## class : SpatRaster
## dimensions : 720, 1440, 1 (nrow, ncol, nlyr)
## resolution : 0.25, 0.25 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs
## source : memory
## name : chirps_monthly2009-01
## min value : 0
## max value : 1221.876After we have altered a raster, we may want to save the new raster. This can be done with the writeRaster function (part of the terra package).
x: raster object. filename: output filename (with complete file path). overwrite: logical.
Let’s use the writeRaster function to save the resampled CHIRPS raster from the last exercise. We will save the raster as a GeoTiff file.
Question: How could you make the file path name change according to the value of a variable?
Once rasters have the same geometry, it is easy to perform mathematical operations. - Example - If we have the following three rasters:
Now, if we have these rasters in a stack:
What would be the result of the following expressions for the highlighted grid-cell:
We can identify the grid-cells with certain attributes. For example, the regions where more than 200 mm of precipitation fell during January, 2009.
For this purpose, we will set the values higher than 200 mm to equal 1 , while the values lower or equal to 200 mm will be set to a value of 0.
Another common use of this function is to create a raster where values are either 1 or NA (hereafter, we will call this a “unitary” raster). In this example, we want to calculate the average precipitation for January 2009 in the Nile Basin countries over the Arid region of the Köppen-Geiger major climate zones.
Let’s find unique values using the values function in combination with the unique function.
uniq.values <- as.numeric(unique(values(climate.zones)))
print(uniq.values)
## [1] NaN 16 13 10 7 9 17 11 14 12 8 4 2 1 5 6 15 3| Value | Climate |
|---|---|
| 1 | Tropical |
| 2 | Arid |
| 3 | Temperate |
| 4 | Polar |
First, we set the values of the arid climate zone (value = 2) to 1 and everything else to NA.
print(arid)
## class : SpatRaster
## dimensions : 6002, 4802, 1 (nrow, ncol, nlyr)
## resolution : 0.008329863, 0.008330556 (x, y)
## extent : 10, 50, -15, 35 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : memory
## name : MCD12Q1_LC1_2009_001
## min value : 1
## max value : 1
print(chirps)
## class : SpatRaster
## dimensions : 2000, 7200, 1 (nrow, ncol, nlyr)
## resolution : 0.05, 0.05 (x, y)
## extent : -180, 180, -50, 50 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : chirps_monthly2009-01.tif
## name : chirps_monthly2009-01
## min value : 0
## max value : 1301.122Let’s resample the unitary raster to the same spatial resolution of the precipitation product.
arid <- resample(arid, chirps, method = "ngb")
## Warning: [resample] argument 'method=ngb' is deprecated, it should be
## 'method=near'
print(arid)
## class : SpatRaster
## dimensions : 2000, 7200, 1 (nrow, ncol, nlyr)
## resolution : 0.05, 0.05 (x, y)
## extent : -180, 180, -50, 50 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : memory
## name : MCD12Q1_LC1_2009_001
## min value : 1
## max value : 1Now, we multiply the precipitation raster by the arid unitary raster and crop it to the climate.zones layer extent.
Finally, we want to obtain the mean precipitation for the arid climate zone.
Sometimes we want to extract specific values (according to coordinates) from a raster. For this purpose, we will use the extract function from the raster package. The first step is to create two vectors with the latitude and longitude.
Now we have to convert the coordinates to a SpatVector.
We can plot CHIRPS and the stations as follows:
To extract the values of the selected stations we will use the extract function.
extraction <- terra::extract(chirps, locations)
extraction <- extraction[2:length(extraction)]
print(extraction)
## chirps_monthly2009-01
## 1 6.863122
## 2 259.973359
## 3 183.161122
## 4 329.991304
## 5 16.177693Because the function extract might be included in another package, the double semicolons (::) indicate the package that should be used. We can also use this extract function to extract time series from a multilayer object.